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Abstract 



Generalized Linear Models (GLMs) are an increasingly popular framework for 
modeling neural spike trains. They have been linked to the theory of stochastic 
point processes and researchers have used this relation to assess goodness-of-fit 
using methods from point-process theory, e.g. the time-rescaling theorem. How- 
ever, high neural firing rates or coarse discretization lead to a breakdown of the as- 
sumptions necessary for this connection. Here, we show how goodness-of-fit tests 
from point-process theory can still be applied to GLMs by constructing equiva- 
lent surrogate point processes out of time-series observations. Furthermore, two 
additional tests based on thinning and complementing point processes are intro- 
duced. They augment the instruments available for checking model adequacy of 
point processes as well as discretized models. 



1 Introduction 



Action potentials are stereotyped all-or-nothing events, meaning that their amplitude is not consid- 
ered to transmit any information and only the exact time of occurrence matters. This view suggests 
to model neurons' responses in the mathematical framework of point processes. An observation 
is a sequence of spike times and their stochastic properties are captured by a single function, the 
conditional intensity Q[. For point processes on the time line, several approaches for evaluating 
goodness-of-fit have been proposed [2]. The most popular in the neuroscientific community has 
been a test based on the time-rescaling theorem (H. 

In practice, neural data is binned such that a spike train is represented as a sequence of spike counts 
per time bin. Specifically, Generalized Linear Models (GLMs) are built on this representation. Such 
discretized models of time series have mostly been seen as an approximation to continuous point 
processes and hence, the time-rescaling theorem was also applied to such models | 



Here we ask the question whether the time-rescaling theorem can be translated to discrete time. We 
review the approximations necessary for the transition to discrete time and point out a procedure 
to create surrogate point processes even when these approximations do not hold (section Two 
novel tests based on two different operations on point processes are introduced: random thinning 
and random complementing. These ideas are applied to a series of examples (section |3), followed 
by a discussion (section|4]i. 
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Figure 1: Spike train representations. (A) A trace of the membrane potential of a spiking neuron. 
(B) Information is conveyed in the timings and number of action potentials. This supports the 
representation of neural activity as a point process in which each spike is assumed to be a singular 
event in time. (C) When time is divided into large bins, the spike train is represented as a time series 
of discrete counts. (D) If the bin width is chosen small enough, the spike train corresponds to a 
binary time series, indicating the presence of a single spike inside a given time bin. 



2 Methods 



2.1 Representations of neural activity 

We characterize a neuron by its response in terms of trains of action potentials using the theory of 
point processes (Figures [TJ\ andQJj). An observation consists of a list of times, each denoting the 
time point of one action potential. Following a common notation yll3], let (0, T] be the time interval 
of the measurement and {ui} be the set of n event times. The stochastic properties of a point process 
are characterized by its conditional intensity function X(t\H(t)), defined as 



w,,„x ,. P[spikcin(M + A)|ff t ] m 
\{t\H t ) = lrm , (1) 

A-s-0 A 

where H t is the history of the stochastic process up to time t and possibly includes other covariates 
of interest. For fitting and evaluating different parameter sets of the conditional intensity function, a 
maximum-likelihood approach is followed JKjQjJ. The log-likelihood of a point process model is 
given by fl]: 



n „T 

log L (point process) = log A (m \ H Ui ) — / \{t\H t )dt. (2) 

One possibility are binning-free models (like renewal processes or other parametric models). Alter- 
natively, \(t\Ht) can be modeled as apiece-wise constant function with each piece having length A. 
In this case, the history term H t covers the history up to the time of the left edge of the current bin. 
Inside the bin, the process locally behaves like a Poisson process with constant rate = \{tk\Hf.) 
with t% = Afc and Hk = H tk . Using the number of spikes c,t per bin as a representation of the obser- 
vation, the discretized version of Equation|2]is equivalent to the log-likelihood of a series of Poisson 
samples (apart from terms that are not dependent on X(t\H t )). Hence, for finding the maximum- 
likelihood solution for the point process, it is equivalently sufficient to maximize the likelihood of 
such a Poisson regression model. The result of fitting will be a sequence of fa for each bin, where 
fa is the expected number of counts. Since a local Poisson process is assumed within the bins, fa is 
related to Xi via: A^ = fa/ A. 

A complementary approach to the point process framework is to see spike trains as time series, 
e. g. as a sequence of counts {cj} or binary events {hi] (Figures[T]Z! andQJ)). For Poisson-GLMs, 
a sequence of Poisson-distributed count variables Cj is modeled and the linear sum of covariates 
is linked to the expected mean of the Poisson distribution fa. Binary time series can be modeled 
as a sequence of conditionally independent Bernoulli trials with outcomes and 1 and success 
probabilities {pi}. For Bernoulli-GLMs, the pi& are linked via a non-linear transfer function to 
a linear sum of covariates. Defined this way, the likelihood for an observed sequence hi given a 
particular model of pi is given by log L (Bernoulli) = J^k bk log ^ + J] fe log(l — pk). In the 
approximation of fa <C 1, fa becomes approximately pi and the likelihoods of the Bernoulli and 
Poisson series become equivalent. Moreover, using the same approximation, it is possible to link 
the Bernoulli series to the conditional intensity function \(t\H t ) via A; « Pi/A . Traditionally, 
this path was chosen to relate the time series to the theory of point processes and to be able to use 
goodness-of-fit analyses available for such point processes @j. 
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Figure 2: Overview of goodness-of-fit tests for point-process models. (A) Using the time-rescaling 
theorem, the time of each spike is rescaled according to the integral of the conditional intensity 
function. (B) Assuming that the conditional intensity function has a lower limit B, spikes of the 
original spike train are thinned by keeping a spike only with probability BX' 1 . (C) Assuming that 
the conditional intensity function has an upper limit C, a complementary process A c = C — A can 
be constructed. Adding samples from this inhomogeneous Poisson process to the observed spikes 
results in a homogeneous Poisson process with rate C. 



2.2 Goodness-of-fit tests for point processes 



Statistical tests are usually evaluated using two measures: The specificity (fraction of correct models 
that pass the test) and the sensitivity or test power (fraction of wrong models that are properly 
rejected by the test). The specificity is set by the significance level: With significance level a, the 
specificity is 1 — a. The sensitivity of a given test depends on the strength of the departure from the 
modeled intensity function to the true intensity. 



2.2.1 The time-rescaling theorem 

A popular way for verifying point-process-based models has been the time-rescaling theorem f3l [l2ll . 
It states that if {ui} is a realization of events from a point process with conditional intensity A(t|i?t), 
then rescaling via the transformation u\ = L ' \(t\H t )dt will yield a unit-rate Poisson process. 

We call the following transformation the naive time-rescaling when it is applied to binary sequences. 
The spike time U{ falling into bin j, is transformed into: u' { = Ylk=i Pk- 



2.2.2 Thinning point processes 



It is well known that an inhomogeneous point process can be simulated by generating a homo- 
geneous Poisson process with constant intensity C with C > maxA(f) (the so-called dominant 
process) and keeping every spike at time U with probability p = lfl3l Q]. In reverse, this 

can be used to do model-checking [fl4fl : Let B be a lower bound of the fitted conditional intensity 
X(t\H(t)). Now take \(t\H(t)) as the dominant process with samples Uj. Thin the process by keep- 
ing a spike with probability wfgn • ^ or a correct ly specified model X(t\H t ), the thinned process 
will be a homogeneous Poisson process with rate B (Figure[Z}3). 
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Typically, B = min A(t) <C A(i) (due to absolute refractoriness in most renewal process models 
and GLMs), such that the thinned process will have a prohibitively low rate and only very few spikes 
will be selected. Testing the Poisson hypothesis on a handful of spikes will result in a vanishingly 
low power. 

To circumvent this problem, we propose the following remedy: Let B* be a threshold which may 
be higher than the lower bound B. Then consider only the intervals of A for which A > B* and 
concatenate those into a new point process. After applying the thinning procedure on all spikes of 
the stitched process, the thinned process should be a Poisson process with rate B*. This procedure 
can be repeated K times for a range of uniformly spaced B*s ranging from B to C (upper bound). 
Stretching each thinned process by a factor of B* creates a set of K unit-rate processes. Each of 
them is tested for the Poisson hypothesis by a Kolmogorov-Smirnov test on the inter-spike intervals. 
The model is rejected when there is at least one significant rejected null hypothesis. To correct for 
the multiple tests, we employ Simes' procedure. It tests the global null hypothesis that all tested 
sub-hypotheses are true against the alternative hypothesis that at least one hypothesis is false. To 

this end, it transforms the ordered list of p-values p^, ...,p( K ^ into — , Kp 2 , -^-^ — . If any 
of the transformed p-values is less than the significance level a = .05, the model is rejected lfl5lPl 

2.2.3 Complementing point processes 

The idea of thinning might also be used the other way round. Assume the observations U j have been 
generated by thinning a homogeneous Poisson process with rate C using the modeled conditional 
intensity X(t\H t ) as the lower bound. Then we can define a complementary process X c (t) = C — 
\(t\H t ) such that adding spikes from the complementary point process to the observed spikes, the 
resulting process will be a homogeneous Poisson process with rate C. This algorithm is a straight- 
forward inversion of the thinning algorithms discussed in 10, Q]]. 

It might happen that the upper bound C of the modeled intensity is much larger than the average 
\{t). In that case, the observed spike pattern would be distorted with high number of Poisson spikes 
from the complementary process and the test power would be low. To avoid this, a similar technique 
as for the thinning procedure can be employed. Define a threshold C* < C and consider only the 
region of the spike train for which X(t\H(t)) < C*. Apply the complementing procedure on these 
parts of the spike train to obtain a point process with rate C* when concatenating the intervals. This 
process can be repeated K times with values C* ranging from B to C. A multiple-test correction 
has to be used, again we propose Simes' method (see previous section). 

2.3 Creating surrogate point processes from time series 

Since the time-rescaling theorem can only be used when \{t\H t ) the exact spike times {ui} are 
known, it is not a priori clear how it applies to discretized time-series models. For such cases, 
we propose to generate surrogate point process samples that are equivalent to the observed time 
series. To apply the time-rescaling theorem on discretized models such as GLMs, the integral of the 
time transformation is replaced by a discrete sum over bins (the naive time-rescaling). Taking the 
simplest example of a homogeneous Poisson process, it is evident that the possible values for the 
rescaled intervals form a finite set. This contradicts the time-rescaling theorem that states that the 
intervals are (continuously) exponentially distributed. Hence, using the time-rescaling theorem on 
discretized data produces a bias lfl7ll . 

While Haslinger et al. considered a modification of the time-rescaling theorem to explicitly ac- 
count for the discrete nature of the model 11711 . we propose a general, simple scheme how to form 
surrogate point processes from Poisson- and Bernoulli-GLMs that can be used for the continuous 
time-rescaling theorem as well as for any other goodness-of-fit test designed for point-process data 
(Figure |3). 

Poisson-GLMs: The observation consists of a sequence of count variables Cj that is modeled as 
a sample from Poisson distributions with mean jttj. Hence, the modeled process can be regarded 
as a piecewise-constant intensity function. The expected number of spikes of a Poisson process is 
related to its intensity via /i^ = A^ A such that we can construct the conditional intensity function as 

'The K tests contain overlapping regions of the same spike train, hence, we expect the statistical tests to be 
correlated. In these cases, a simple Bonferroni-correction would be too conservative flHl . 
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binning - free model 
spike times {u. }; conditional intensity X(t I H{t)) 



Bernoulli- GLM 
binary observations [b ; }; spiking probabilities [p i } 



Poisson - GLM 
spike counts { c ; ) ; expected counts { fi { } 



draw {c, ) from P^' M \c l =k\k>\) 



draw spiketimes (u 1 } from Unif(Q,A) inside each bin: set A ( = 



apply goodness - of - fit procedures based on point process {u j and conditional intensity /i(t I H(t)) 



Figure 3: Creating surrogate point processes from time series. For bin-free point process models 
for which the spike times and a conditional intensity X(t\H(t)) is available, goodness-of-fit tests 
for point processes can be readily applied. For Poisson-GLMs, exact spike times are drawn inside 
each bin for the specified number of spikes that were observed. The piece-wise constant conditional 
intensity function is linked to the modeled number of counts per bin via A,; = A -1 /^. For Bernoulli- 
GLMs, the probability of obtaining at least one spike per bin pi is modeled. For each bin with spikes 
(hi = 1) - assuming a local Poisson process - a sample Cj from a biased Poisson distribution with 
mean /i; = — ln(l — pi) is drawn together with corresponding spike times. Finally, point-process 
based goodness-of-fit tests may be applied to this surrogate spike train. 



piece-wise constant with values A; = A -1 /^. Conditioned on the number of spikes that occurred in 
a homogeneous Poisson process of rate \, the exact spike times are uniformly distributed inside bin 
i. A surrogate point process can be constructed from a Poisson-GLM by generating random spike 
times (i — 1 + Unif(0, 1)) A for each spike within bin i (1 < i < N) for all bins with Cj > 0. One 
can then proceed to the point-process-based goodness-of-fit tools using the surrogate spike train and 
its conditional intensity A;. 

Bernoulli-GLMs: Based on the observed binary spike train {bi}, the sequence of probabilities pi 
of spiking within bin i is modeled. We can relate this to the point process framework using the 
following observations: Assume that pi denotes the probability of finding at least one spike within 
each birQ and that locally, the process behaves like a Poisson process. Then, Pl =P<£ oisaoa \x > 
1) = 1 — p^P olsson ) (x = 0) = 1 — exp(— fii). The conditional intensity is given by Ai = A _1 /ij = 
—A" 1 ln(l — pi). In practice, for each bin with bi = 1, we draw the amount of spikes within the 
bin by first sampling from the distribution p^P olsson )^ _ > ^ an( j sa mple exact spike times 
uniformly as in the case of the Poisson-GLMs. 

3 Results 

Here, we compare the performance of the three different approaches in detecting wrongly specified 
models, using examples of models that are commonly applied in neural data analysis. For the 
thinning and complementing procedure, K = 10 partitions were chosen (see section l2".2.21 i. Unless 
otherwise noted, we report the test power at a specificity of 1 — a = .95. The Poisson hypothesis in 
the proposed procedures is tested by a Kolmogorov-Smirnov test on the inter-spike intervals of the 
transformed process. 

3.1 Example: Inhomogeneous Poisson process 

Consider an inhomogeneous Poisson process with band-limited intensity: X(t\H t ) — \(t) — 
20 Hz + Ei=i° u j ^ll^T)^ with / = 1 Hz and J — 40 coefficients that were randomly 

drawn from a uniform distribution on the interval [0, 20]. The process was simulated over a length 
of T = 20 s and the intensity was discretized with A = 1 ms. Negative intensities were clipped 

2 Such clipping is implicitly performed in many studies, e. g. in [ 18, l^. l20n . 



5 



150 




5 10 15 

time [s] 

(a) intensity function 







— complementing 




— thinning 




— rescaling 




— naive rescaling 



10 



20 30 
jitter strength 



40 



(b) test power 



FT 



i 0.6 



-complementing 
— thinning 
— rescaling 
-naive rescaling 



0.5 
1 -specificity 

(c) ROC curve 



1 



Figure 4: Inhomogeneous Poisson process. (A) Sample intensity functions for an undistorted inten- 
sity (black line) and two models with jitters in the coefficients (f3 — 12, medium jitter and /3 = 30, 
large jitter). (B) The test power of each test as a function of the jitter strength. The dashed line 
indicates the level of the medium jitter strength (red line in figure A). (C) ROC curve analysis for 
an intermediate jitter strength of j3 = 12. The intersection of the curves with the dashed line corre- 
sponds to the test power at a significance level of a = .05. 



to zero. A binary spike train was generated by calculating the probability of at least one spike in 
each time bin as pi = \ — exp(— A(tj)A) and drawing samples from a Bernoulli distribution with 
specified probabilities p,. 

For evaluating the different algorithms, wrong models for the intensity were created with jittered 
coefficients u' k = Uk + /3Unif (— 1, 1) where (3 indicates the strength of the deviation from the true 
model. For each jitter strength, N = 1000 spike trains were generated from the true model and 
X(t\H t ) was constructed using the wrong model (FigureHK). For any /3 > 0, the fraction of rejected 
models defines the sensitivity or test power. For j3 = 0, the fraction of accepted models defines the 
specificity which was controlled to be at 1 — a = .95 for each test. 

All three methods (rescaling, thinning, complementing) show a specified type-I error of approx- 
imately 5% (j3 = 0) and progressively detect the wrong models. Notably, the complementing 
and thinning procedures detect a departure from the correct model earlier than the classical rescal- 
ing (Figure H)3). For comparison, also the naive implementation of the rescaling transformation is 
shown. The significance level for the KS test used for the naive time-rescaling was adjusted to 
a = .015 to achieve a 95% specificity. The adjustment was necessary due to the discretization bias 
(see section l2~3l ). 

For models with an intermediate jitter strength (/3 = 12), ROC curves were constructed. Here, for a 
given significance level a, a pair of true and false positive rates can be calculated and plotted for each 
test (taking N = 1000 repetitions using the true model and the model with jittered coefficients). It 
can be seen that especially for intermediate jitter strengths, complementing and thinning outperform 
time-rescaling (Figure[4]rj), independent of the chosen significance level. 



3.2 Example: Renewal process 

In a second example, we consider renewal processes, i. e. inter-spike intervals are an i. i. d. sample 
from a specific probability distribution p(At). In this case, the conditional intensity is given by 

X(t\H t ) — — fft** where i* denotes the time of the last spike prior to time t. For this 

1-J p(u)du 

example, we chose the Gamma distribution as it is commonly used to model real spike trains 0,[H0]. 
The spike train was generated from a true model, following a Gamma distribution with scale param- 

_ At 

eter A = 0.032 and shape parameter B = 6.25: p(At) — (At) B ^ 1 -^jTb) • Wrong models were 
generated by scaling the shape and scale parameter by a factor of 1 + f3 ("jitter") while keeping the 
expected value of the distribution constant (i. e. B' = (1 + /3)B, A' = (1 + A) (Figure^)- 
For each jitter strength, N = 1000 data sets of length T = 20 s were generated from the true model 
and the wrong model and the tests were applied. 
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Figure 5: Renewal process. (A) Inter-spike interval distributions for the undistorted (black line) and 
distorted models (medium jitter, /? = 0.5 and strong jitter, (3 — 1.0). For comparison, a sample ISI 
histogram from one of the simulations is shown in gray. Note that the mean of the three distributions 
is matched to be the same (vertical dashed line). (B) The test power of each test as a function of the 
jitter strength. The dashed line indicates the level of the medium jitter strength (red line in figure A). 
(C) ROC curve analysis for an intermediate jitter strength of (3 = 0.5. The intersection of the curves 
with the dashed line corresponds to the test power at a significance level of a = .05. 



The analysis of test power for each test and the ROC curve analysis for an intermediate jitter strength 
reveal that time-rescaling is slightly superior to thinning and complementing (Figure^ and C). The 
naive time-rescaling performs worst (adjusted significance level for the KS test, a — .017). 



3.3 Example: Inhomogeneous Spike Response Model 

We model an inhomogeneous spike response model with escape noise using a Bernoulli-GLM l2lll . 
The spiking probability is modulated by an inhomogeneous rate r(t). Additionally, for each spike, 
a post-spike kernel is added to the process intensity. The rate function is modeled like in the first 

example as a band-limited function r ti — r(£j) = ^2jZi° u j sm ^^ t \ T '^ T ^ with / = 1 Hz 

and J = 40 coefficients that were randomly drawn from a uniform distribution on the interval 
[—0.2, 0.2]. The post-spike kernel r)(At) is modeled as a sum of three exponential functions (r = 
5 ms, 25 ms and 1 s) with appropriate amplitudes as to mimick a relative refractory period, a small 
rebound and a slow (inhibitory) adaptation. To construct the Bernoulli-GLM, the spiking probability 
Pi per bin of length A = 1 ms is p t = 1+ex ^_ Si) with s< = -3 + r ti + Y,{ U] }<u ~ k). 

A binary time series (the spike train) was generated for a duration of T = 20 s. The jittered models 
were constructed by adding a jitter j3 on the coefficients of the inhomogeneous rate modulation 
(Figure |6]\). For each jitter strength, N = 1000 data sets were generated from the true model and 
the wrong model and the tests were applied. 

Both thinning and complementing are able to detect smaller distortions than both the time-rescaling 
on the surrogate and discrete data (Figure |6j3, adjusted significance level for the naive rescaling, 
a = .018). A ROC curve analysis for an intermediate jitter strength (]3 = 0.4) supports this finding 
(FigureUt). 



4 Discussion 



Assessing goodness-of-fit for Generalized Linear Models has mostly been done by applying the 
time-rescaling transformation that is defined for point processes, assuming a match between those 
approaches. When the per-bin probability of spiking cannot be regarded as low, this approximation 
breaks down and creates a bias when applying the time-rescaling transformation [17]. In a first 
step, we proposed a procedure to create surrogate point processes from discretized models, such as 
Bernoulli- and Poisson-GLMs, that do not exhibit this bias. Throughout all the examples, the time- 
rescaling theorem applied to the surrogate point process was systematically better than applying the 
naive time-rescaling on the discrete data. Since only the adjusted time-rescaling procedure allows 
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Figure 6: Inhomogeneous Spike Response Model. (A) Sample intensity functions for an undistorted 
intensity (black line) and two misspecified models (medium jitter, (3 — 0.4 and strong jitter, /3 = 
1.0). (B) The test power of each test as a function of the jitter strength. The dashed line indicates the 
level of the medium jitter strength (red line in figure A). (C) ROC curve analysis for an intermediate 
jitter strength of j3 = 0.4. The intersection of the curves with the dashed line corresponds to the test 
power at a significance level of a = .05. 



to reliably control the specificity of the test, it should be preferred over the classical time-rescaling 
in all cases where discretized models are used. 

We have presented two alternatives to an application of the time-rescaling theorem: For the first 
procedure, the observed spike train is thinned according to the value of the conditional intensity at 
the time of spikes. The resulting process is then a homogeneous Poisson process with a rate that is 
equal to the lower bound on the conditional intensity. The second proposed method builds on the 
idea that an intensity function A(t) with an upper bound C can be filled up to a homogeneous Poisson 
process of rate C by adding spike samples from the complementary process C — A(t). The proposed 
tests work best if the lower and upper bounds are tight. However, in most practical cases, especially 
the lower bound will be prohibitively low to apply any statistical test on the thinned process. As a 
remedy, we proposed to consider only regions of X(t\H(t)) for which the intensity exceeds a given 
threshold and repeat the thinning for different thresholds. This successfully overcomes the limitation 
that may have - up to now - prevented the use of the thinning algorithm as a goodness-of-fit measure 
for neural models. 

The three tests are complementary in the sense that they are sensitive to different deviations of the 
modeled and true intensity function. Time-rescaling is only sensitive to the total integral of the 
intensity function between spikes, while thinning exclusively considers the intensity function at the 
time of spikes and is insensitive to its value at places where no spikes occurred. Complementing is 
sensitive to the exact shape of X(t) regardless of where the spikes from the original observations are. 

For the examples of an inhomogeneous Poisson process and the Spike Response Model, thinning 
and complementing outperform the sensitivity of the simple time-rescaling procedure. They can 
detect deviations from the model that are only half as large as the ones necessary to alert the test 
based on time-rescaling. For modeling renewal processes, time-rescaling was slightly advantageous 
compared to the to other methods. This should not come as a surprise since the time-rescaling test 
is known to be sensitive to modeling the distribution of inter-spike intervals yj- 

Beside from likelihood criteria lfl2l l22l l23|l . there exist few goodness-of-fit tools for neural mod- 
els based on Generalized Linear Models ll2l l24ll . With the proposed procedure for surrogate point 
processes, we bridge the gap between such discrete models and point processes. That enables to 
make use of additional tests from this domain, such as thinning and complementing procedures. We 
expect these to be valuable contributions to the general practice of statistical evaluation in modeling 
single neurons as well as neural populations. 
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